Goto

Collaborating Authors

 triplet network


Automated Knot Detection and Pairing for Wood Analysis in the Timber Industry

Lin, Guohao, Pan, Shidong, Khanbayov, Rasul, Yang, Changxi, Khaloian-Sarnaghi, Ani, Kovryga, Andriy

arXiv.org Artificial Intelligence

Knots in wood are critical to both aesthetics and structural integrity, making their detection and pairing essential in timber processing. However, traditional manual annotation was labor-intensive and inefficient, necessitating automation. This paper proposes a lightweight and fully automated pipeline for knot detection and pairing based on machine learning techniques. In the detection stage, high-resolution surface images of wooden boards were collected using industrial-grade cameras, and a large-scale dataset was manually annotated and preprocessed. After the transfer learning, the YOLOv8l achieves an mAP@0.5 of 0.887. In the pairing stage, detected knots were analyzed and paired based on multidimensional feature extraction. A triplet neural network was used to map the features into a latent space, enabling clustering algorithms to identify and pair corresponding knots. The triplet network with learnable weights achieved a pairing accuracy of 0.85. Further analysis revealed that he distances from the knot's start and end points to the bottom of the wooden board, and the longitudinal coordinates play crucial roles in achieving high pairing accuracy. Our experiments validate the effectiveness of the proposed solution, demonstrating the potential of AI in advancing wood science and industry.


Robust Clustering on High-Dimensional Data with Stochastic Quantization

Kozyriev, Anton, Norkin, Vladimir

arXiv.org Artificial Intelligence

This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-Means and its variant K-Means++, and explores the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems. Some traditional clustering algorithms suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as Mini-Batch K-Means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, the Stochastic Quantization algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data, comparing model accuracy across various ratios of labeled to unlabeled data. To address the challenge of high dimensionality, we trained Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both the Stochastic Quantization algorithm and traditional quantization algorithms. Furthermore, we enhance the algorithm's convergence speed by introducing modifications with an adaptive learning rate.


Classification of freshwater snails of the genus Radomaniola with multimodal triplet networks

Vetter, Dennis, Ahsan, Muhammad, Delicado, Diana, Neubauer, Thomas A., Wilke, Thomas, Roig, Gemma

arXiv.org Artificial Intelligence

In this paper, we present our first proposal of a machine learning system for the classification of freshwater snails of the genus Radomaniola. We elaborate on the specific challenges encountered during system design, and how we tackled them; namely a small, very imbalanced dataset with a high number of classes and high visual similarity between classes. We then show how we employed triplet networks and the multiple input modalities of images, measurements, and genetic information to overcome these challenges and reach a performance comparable to that of a trained domain expert.


Hierarchical localization with panoramic views and triplet loss functions

Alfaro, Marcos, Cabrera, Juan José, Jiménez, Luis Miguel, Reinoso, Óscar, Payá, Luis

arXiv.org Artificial Intelligence

The main objective of this paper is to address the mobile robot localization problem with Triplet Convolutional Neural Networks and test their robustness against changes of the lighting conditions. We have used omnidirectional images from real indoor environments captured in dynamic conditions that have been converted to panoramic format. Two approaches are proposed to address localization by means of triplet neural networks. First, hierarchical localization, which consists in estimating the robot position in two stages: a coarse localization, which involves a room retrieval task, and a fine localization is addressed by means of image retrieval in the previously selected room. Second, global localization, which consists in estimating the position of the robot inside the entire map in a unique step. Besides, an exhaustive study of the loss function influence on the network learning process has been made. The experimental section proves that triplet neural networks are an efficient and robust tool to address the localization of mobile robots in indoor environments, considering real operation conditions.


Ugly Ducklings or Swans: A Tiered Quadruplet Network with Patient-Specific Mining for Improved Skin Lesion Classification

Naranpanawa, Nathasha, Soyer, H. Peter, Mothershaw, Adam, Kulatilleke, Gayan K., Ge, Zongyuan, Betz-Stablein, Brigid, Chandra, Shekhar S.

arXiv.org Artificial Intelligence

An ugly duckling is an obviously different skin lesion from surrounding lesions of an individual, and the ugly duckling sign is a criterion used to aid in the diagnosis of cutaneous melanoma by differentiating between highly suspicious and benign lesions. However, the appearance of pigmented lesions, can change drastically from one patient to another, resulting in difficulties in visual separation of ugly ducklings. Hence, we propose DMT-Quadruplet - a deep metric learning network to learn lesion features at two tiers - patient-level and lesion-level. We introduce a patient-specific quadruplet mining approach together with a tiered quadruplet network, to drive the network to learn more contextual information both globally and locally between the two tiers. We further incorporate a dynamic margin within the patient-specific mining to allow more useful quadruplets to be mined within individuals. Comprehensive experiments show that our proposed method outperforms traditional classifiers, achieving 54% higher sensitivity than a baseline ResNet18 CNN and 37% higher than a naive triplet network in classifying ugly duckling lesions. Visualisation of the data manifold in the metric space further illustrates that DMT-Quadruplet is capable of classifying ugly duckling lesions in both patient-specific and patient-agnostic manner successfully.


Self-Supervised Anomaly Detection of Rogue Soil Moisture Sensors

Deforce, Boje, Baesens, Bart, Diels, Jan, Asensio, Estefanía Serral

arXiv.org Artificial Intelligence

IoT data is a central element in the successful digital transformation of agriculture. However, IoT data comes with its own set of challenges. E.g., the risk of data contamination due to rogue sensors. A sensor is considered rogue when it provides incorrect measurements over time. To ensure correct analytical results, an essential preprocessing step when working with IoT data is the detection of such rogue sensors. Existing methods assume that well-behaving sensors are known or that a large majority of the sensors is well-behaving. However, real-world data is often completely unlabeled and voluminous, calling for self-supervised methods that can detect rogue sensors without prior information. We present a self-supervised anomalous sensor detector based on a neural network with a contrastive loss, followed by DBSCAN. A core contribution of our paper is the use of Dynamic Time Warping in the negative sampling for the triplet loss. This novelty makes the use of triplet networks feasible for anomalous sensor detection. Our method shows promising results on a challenging dataset of soil moisture sensors deployed in multiple pear orchards.


Meta-Learning Triplet Network with Adaptive Margins for Few-Shot Named Entity Recognition

Han, Chengcheng, Zhu, Renyu, Kuang, Jun, Chen, FengJiao, Li, Xiang, Gao, Ming, Cao, Xuezhi, Wu, Wei

arXiv.org Artificial Intelligence

Meta-learning methods have been widely used in few-shot named entity recognition (NER), especially prototype-based methods. However, the Other(O) class is difficult to be represented by a prototype vector because there are generally a large number of samples in the class that have miscellaneous semantics. To solve the problem, we propose MeTNet, which generates prototype vectors for entity types only but not O-class. We design an improved triplet network to map samples and prototype vectors into a low-dimensional space that is easier to be classified and propose an adaptive margin for each entity type. The margin plays as a radius and controls a region with adaptive size in the low-dimensional space. Based on the regions, we propose a new inference procedure to predict the label of a query instance. We conduct extensive experiments in both in-domain and cross-domain settings to show the superiority of MeTNet over other state-of-the-art methods. In particular, we release a Chinese few-shot NER dataset FEW-COMM extracted from a well-known e-commerce platform. To the best of our knowledge, this is the first Chinese few-shot NER dataset. All the datasets and codes are provided at https://github.com/hccngu/MeTNet.


Content-based Music Similarity with Triplet Networks

Cleveland, Joseph, Cheng, Derek, Zhou, Michael, Joachims, Thorsten, Turnbull, Douglas

arXiv.org Artificial Intelligence

Our network is trained using triplets of songs such that two songs by the same In this paper, we explore the feasibility of using Triplet artist are embedded closer to one another than to networks, a variant of Siamese networks (Bromley et al., a third song by a different artist. We compare 1994), for content-based music recommendation. In this two models that are trained using different ways context, a Triplet network learns an embedding of an item of picking this third song: at random vs. based such that the item is close to other similar items and far on shared genre labels. Our experiments are conducted from dissimilar items in the embedding space. To train using songs from the Free Music Archive the network, we will consider songs by the same artist to and use standard audio features. The initial results be similar and songs by all other artists to be dissimilar.


Shared Manifold Learning Using a Triplet Network for Multiple Sensor Translation and Fusion with Missing Data

Dutt, Aditya, Zare, Alina, Gader, Paul

arXiv.org Artificial Intelligence

Abstract--Heterogeneous data fusion can enhance the robustness and accuracy of an algorithm on a given task. However, due to the difference in various modalities, aligning the sensors and embedding their information into discriminative and compact representations is challenging. In this paper, we propose a Contrastive learning based MultiModal Alignment Network (CoMMANet) to align data from different sensors into a shared and discriminative manifold where class information is preserved. The proposed architecture uses a multimodal triplet autoencoder to cluster the latent space in such a way that samples of the same classes from each heterogeneous modality are mapped close to each other. Since all the modalities exist in a shared manifold, a unified classification framework is proposed. A comparison made with other methods demonstrates the superiority of this method. This method is also called decision fusion. In the context of a neural network, these outstanding results on tasks like land-use and land-cover representations are generated by the convolutional layers classification (LULC) [1] [2], mineral exploration [3] [4] and fused gradually to form a shared representation [5], urban planning [6], biodiversity conservation [7], sentiment layer. In Fusion methods can be classified into two groups: concatenation and alignment-based methods. Personal use of this material is permitted. To increase the interpretability learn spatial information by using a structured morphological of fusion models, Hong et al. [27] proposed a element of predefined size and shape. They proposed a graphbased shared and specific feature learning (S2FL) that is capable of model to couple the dimension reduction and fusion of decomposing data into modality-shared and modality-specific information. However, using this method, the cloud-covered components, which enables a better information blending of regions are not accurately classified because the morphological multiple heterogeneous modalities.


Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation

Hammoudeh, Ahmad, Vanderplaetse, Bastein, Dupont, Stéphane

arXiv.org Artificial Intelligence

This work aims at generating captions for soccer videos using deep learning. In this context, this paper introduces a dataset, model, and triple-level evaluation. The dataset consists of 22k caption-clip pairs and three visual features (images, optical flow, inpainting) for ~500 hours of \emph{SoccerNet} videos. The model is divided into three parts: a transformer learns language, ConvNets learn vision, and a fusion of linguistic and visual features generates captions. The paper suggests evaluating generated captions at three levels: syntax (the commonly used evaluation metrics such as BLEU-score and CIDEr), meaning (the quality of descriptions for a domain expert), and corpus (the diversity of generated captions). The paper shows that the diversity of generated captions has improved (from 0.07 reaching 0.18) with semantics-related losses that prioritize selected words. Semantics-related losses and the utilization of more visual features (optical flow, inpainting) improved the normalized captioning score by 28\%. The web page of this work: https://sites.google.com/view/soccercaptioning}{https://sites.google.com/view/soccercaptioning